网络钓鱼袭击在互联网上继续成为一个重大威胁。先前的研究表明,可以确定网站是否是网络钓鱼,也可以更仔细地分析其URL。基于URL的方法的一个主要优点是它即使在浏览器中呈现网页之前,它也可以识别网络钓鱼网站,从而避免了其他潜在问题,例如加密和驾驶下载。但是,传统的基于URL的方法有它们的局限性。基于黑名单的方法容易出现零小时网络钓鱼攻击,基于先进的机器学习方法消耗高资源,而其他方法将URL发送到远程服务器,损害用户的隐私。在本文中,我们提出了一个分层的防护防御,PhishMatch,这是强大,准确,廉价和客户端的。我们设计一种节省空间高效的AHO-Corasick算法,用于精确串联匹配和基于N-GRAM的索引技术,用于匹配的近似字符串,以检测网络钓鱼URL中的各种弧度标准技术。为了减少误报,我们使用全球白名单和个性化用户白名单。我们还确定访问URL的上下文并使用该信息更准确地对输入URL进行分类。 PhishMatch的最后一个组成部分涉及机器学习模型和受控搜索引擎查询以对URL进行分类。发现针对Chrome浏览器开发的PhishMatch的原型插件,是快速轻便的。我们的评价表明,PhishMatch既有效又有效。
translated by 谷歌翻译
State-of-the-art automatic augmentation methods (e.g., AutoAugment and RandAugment) for visual recognition tasks diversify training data using a large set of augmentation operations. The range of magnitudes of many augmentation operations (e.g., brightness and contrast) is continuous. Therefore, to make search computationally tractable, these methods use fixed and manually-defined magnitude ranges for each operation, which may lead to sub-optimal policies. To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations. RangeAugment uses an auxiliary loss based on image similarity as a measure to control the range of magnitudes of augmentation operations. As a result, RangeAugment has a single scalar parameter for search, image similarity, which we simply optimize via linear search. RangeAugment integrates seamlessly with any model and learns model- and task-specific augmentation policies. With extensive experiments on the ImageNet dataset across different networks, we show that RangeAugment achieves competitive performance to state-of-the-art automatic augmentation methods with 4-5 times fewer augmentation operations. Experimental results on semantic segmentation, object detection, foundation models, and knowledge distillation further shows RangeAugment's effectiveness.
translated by 谷歌翻译
In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data. Basically, we design and synthesize a wide range of potential errors and check whether they result in a commensurate drop in the metric scores. We examine a range of recently proposed evaluation metrics based on pretrained language models, for the tasks of open-ended generation, translation, and summarization. Our experiments reveal interesting insensitivities, biases, or even loopholes in existing metrics. For example, we find that BERTScore ignores truncation errors in summarization, and MAUVE (built on top of GPT-2) is insensitive to errors at the beginning of generations. Further, we investigate the reasons behind these blind spots and suggest practical workarounds for a more reliable evaluation of text generation.
translated by 谷歌翻译
Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather, only unannotated texts from each language are presented to the model separately and independently of one another, and the model appears to implicitly learn cross-lingual connections. This raises several questions that motivate our study, such as: Are the cross-lingual connections between every language pair equally strong? What properties of source and target language impact the strength of cross-lingual transfer? Can we quantify the impact of those properties on the cross-lingual transfer? In our investigation, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by the model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer, significantly more than just the lexical similarity of languages. For a given language, we are able to predict zero-shot performance, that increases on a logarithmic scale with the number of few-shot target language data points.
translated by 谷歌翻译
Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works like WiseFT (Wortsman et al., 2021) and LP-FT (Kumar et al., 2022) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for in-distribution (ID) and out-of-distribution (OOD) data. In this work, we show that a natural and simple approach of mimicking contrastive pretraining consistently outperforms alternative finetuning approaches. Specifically, we cast downstream class labels as text prompts and continue optimizing the contrastive loss between image embeddings and class-descriptive prompt embeddings (contrastive finetuning). Our method consistently outperforms baselines across 7 distribution shifts, 6 transfer learning, and 3 few-shot learning benchmarks. On WILDS-iWILDCam, our proposed approach FLYP outperforms the top of the leaderboard by $2.3\%$ ID and $2.7\%$ OOD, giving the highest reported accuracy. Averaged across 7 OOD datasets (2 WILDS and 5 ImageNet associated shifts), FLYP gives gains of $4.2\%$ OOD over standard finetuning and outperforms the current state of the art (LP-FT) by more than $1\%$ both ID and OOD. Similarly, on 3 few-shot learning benchmarks, our approach gives gains up to $4.6\%$ over standard finetuning and $4.4\%$ over the state of the art. In total, these benchmarks establish contrastive finetuning as a simple, intuitive, and state-of-the-art approach for supervised finetuning of image-text models like CLIP. Code is available at https://github.com/locuslab/FLYP.
translated by 谷歌翻译
在现代世界中,数据科学和分析以优化或预测结果的应用无处不在。数据科学和分析已经优化了市场中存在的几乎所有领域。在我们的调查中,我们专注于如何在体育领域采用分析领域,以及它如何促进游戏的转型,从评估现场玩家及其选择到赢得团队的预测以及大型体育比赛的门票和商业方面的营销。我们将介绍体育分析领域采用的不同运动的分析工具,算法和方法论,并介绍我们对同一体育的看法,我们还将比较和对比这些现有方法。通过这样做,我们还将介绍任何希望尝试体育数据并分析游戏的各个方面的人考虑的最佳工具,算法和分析方法。
translated by 谷歌翻译
数字技术的发展和体育运动的日益普及激发了创新者,通过引入幻想体育平台FSP,将体育倾向的用户带到一个全新的不同层次上。数据科学和分析的应用在现代世界中无处不在。数据科学和分析打开门,以获得更深入的理解和帮助,以帮助决策过程。我们坚信,我们可以采用数据科学来预测FSP上的获胜幻想板球团队,Dream 11.我们建立了一个预测模型,可以预测潜在游戏中玩家的性能。我们结合了贪婪和背包算法的组合,开出了11名球员的组合,创建了一支幻想板球团队,这是最重要的统计赔率,即最大的团队成为最强的团队,从而使我们有更大的机会赢得梦想中的赌注。 11 FSP。我们使用Pycaret Python库来帮助我们理解并采用最佳回归算法来进行问题陈述,以做出精确的预测。此外,我们使用Plotly Python图书馆为我们提供了对团队的视觉见解,并且玩家通过计算前瞻性游戏的统计和主观因素来表演。交互作用图帮助我们提高了我们的预测模型的建议。您要么赢得大,赢得小巧,要么根据预期游戏中为您的幻想团队选出的球员的表现而失去赌注,而我们的模型增加了您赢得大的可能性。
translated by 谷歌翻译
我们介绍了一个机器人组装系统,该系统简化了从产品组件的CAD模型到完整编程和自适应组装过程的设计对制造工作流程。我们的系统(在CAD工具中)捕获了特定机器人工作电脑组装过程的意图,并生成了任务级指令的配方。通过将视觉传感与深度学习的感知模型相结合,机器人推断出从生成的配方中组装设计的必要动作。感知模型是直接从模拟训练的,从而使系统可以根据CAD信息识别各个部分。我们用两个机器人的工作栏演示了系统,以组装互锁的3D零件设计。我们首先在模拟中构建和调整组装过程,并验证生成的食谱。最后,真正的机器人工作电池使用相同的行为组装了设计。
translated by 谷歌翻译
最近的各向同性网络,例如Convmixer和Vision Transformers,在视觉识别任务中发现了巨大的成功,匹配或胜过非方向性卷积神经网络(CNNS)。各向同性架构特别适合跨层重量共享,这是一种有效的神经网络压缩技术。在本文中,我们对各向同性网络中共享参数的方法(SPIN)进行了经验评估。我们提出了一个框架,以形式化重量分享设计决策并对此设计空间进行全面的经验评估。在我们的实验结果的指导下,我们提出了一种重量共享策略,以与仅传统缩放方法相比,在拖放和参数与准确性方面,产生一个具有更好总体效率的模型家族,例如,将Convmixer压缩为1.9倍,同时提高准确性的准确性成像网。最后,我们进行定性研究,以进一步了解各向同性体系结构中的重量共享的行为。该代码可在https://github.com/apple/ml-pin上找到。
translated by 谷歌翻译
测试时间适应(TTA)是指适应神经网络以进行分配变化,仅在测试时间内从新域中访问未标记的测试样本。先前的TTA方法优化了无监督的目标,例如帐篷中的模型预测的熵[Wang等,2021],但目前尚不清楚到底是什么使TTA损失良好。在本文中,我们首先提出一个令人惊讶的现象:如果我们尝试在广泛的功能上衡量最佳的TTA损失,那么我们恢复了与(温度缩放版本的)非常相似的函数帐篷采用的软磁性 - 凝集。但是,只有在我们正在适应的分类器通过跨凝结训练的情况下,这才能保持;如果通过平方损失训练,则会出现不同的最佳TTA损失。为了解释这一现象,我们通过训练损失的凸结合物分析了TTA。我们表明,在自然条件下,这种(无监督的)共轭功能可以看作是对原始监督损失的局部近似值,实际上,它恢复了元学习发现的最佳损失。这导致了一种通用食谱,可用于为通用类的任何给定监督培训损失功能找到良好的TTA损失。从经验上讲,我们的方法始终在广泛的基准测试中统治其他基线。当应用于新型损失功能的分类器时,我们的方法尤其令人感兴趣,例如,最近所传播的polyloss与基于熵的损失有很大的不同。此外,我们表明我们的方法也可以用非常特定的软标签解释为一种自我训练,我们将其称为共轭伪标记。总体而言,我们的方法为更好地理解和改善测试时间适应提供了广泛的框架。代码可在https://github.com/locuslab/tta_conjugate上找到。
translated by 谷歌翻译